6 research outputs found
Topological data analysis of human vowels: Persistent homologies across representation spaces
Topological Data Analysis (TDA) has been successfully used for various tasks
in signal/image processing, from visualization to supervised/unsupervised
classification. Often, topological characteristics are obtained from persistent
homology theory. The standard TDA pipeline starts from the raw signal data or a
representation of it. Then, it consists in building a multiscale topological
structure on the top of the data using a pre-specified filtration, and finally
to compute the topological signature to be further exploited. The commonly used
topological signature is a persistent diagram (or transformations of it).
Current research discusses the consequences of the many ways to exploit
topological signatures, much less often the choice of the filtration, but to
the best of our knowledge, the choice of the representation of a signal has not
been the subject of any study yet. This paper attempts to provide some answers
on the latter problem. To this end, we collected real audio data and built a
comparative study to assess the quality of the discriminant information of the
topological signatures extracted from three different representation spaces.
Each audio signal is represented as i) an embedding of observed data in a
higher dimensional space using Taken's representation, ii) a spectrogram viewed
as a surface in a 3D ambient space, iii) the set of spectrogram's zeroes. From
vowel audio recordings, we use topological signature for three prediction
problems: speaker gender, vowel type, and individual. We show that
topologically-augmented random forest improves the Out-of-Bag Error (OOB) over
solely based Mel-Frequency Cepstral Coefficients (MFCC) for the last two
problems. Our results also suggest that the topological information extracted
from different signal representations is complementary, and that spectrogram's
zeros offers the best improvement for gender prediction
Detecting human and non-human vocal productions in large scale audio recordings
We propose an automatic data processing pipeline to extract vocal productions
from large-scale natural audio recordings. Through a series of computational
steps (windowing, creation of a noise class, data augmentation, re-sampling,
transfer learning, Bayesian optimisation), it automatically trains a neural
network for detecting various types of natural vocal productions in a noisy
data stream without requiring a large sample of labeled data. We test it on two
different data sets, one from a group of Guinea baboons recorded from a primate
research center and one from human babies recorded at home. The pipeline trains
a model on 72 and 77 minutes of labeled audio recordings, with an accuracy of
94.58% and 99.76%. It is then used to process 443 and 174 hours of natural
continuous recordings and it creates two new databases of 38.8 and 35.2 hours,
respectively. We discuss the strengths and limitations of this approach that
can be applied to any massive audio recording
Detecting non-adjacent dependencies is the exception rather than the rule
International audienceStatistical learning refers to our sensitivity to the distributional properties of our environment. Humans have been shown to readily detect the dependency relationship of events that occur adjacently in a stream of stimuli but processing non-adjacent dependencies (NADs) appears more challenging. In the present study, we tested the ability of human participants to detect NADs in a new Hebb-naming task that has been proposed recently to study regularity detection in a noisy environment. In three experiments, we found that most participants did not manage to extract NADs. These results suggest that the ability to learn NADs in noise is the exception rather than the rule. They provide new information about the limits of statistical learning mechanisms
Detection of regularities in a random environment
International audienceRegularity detection, or statistical learning, is regarded as a fundamental component of our cognitive system. To test the ability of human participants to detect regularity in a more ecological situation (i.e., mixed with random information), we used a simple letter-naming paradigm in which participants were instructed to name single letters presented one at a time on a computer screen. The regularity consisted of a triplet of letters that were systematically presented in that order. Participants were not told about the presence of this regularity. A variable number of random letters were presented between two repetitions of the regular triplet, making this paradigm similar to a Hebb repetition task. Hence, in this Hebb-naming task, we predicted that if any learning of the triplet occurred, naming times for the predictable letters in the triplet would decrease as the number of triplet repetitions increased. Surprisingly, across four experiments, detection of the regularity only occurred under very specific experimental conditions and was far from a trivial task. Our study provides new evidence regarding the limits of statistical learning and the critical role of contextual information in the detection (or not) of repeated patterns
Detection and classification of vocal productions in large scale audio recordings
We propose an automatic data processing pipeline to extract vocal productions from large-scale natural audio recordings and classify these vocal productions. The pipeline is based on a deep neural network and adresses both issues simultaneously. Though a series of computationel steps (windowing, creation of a noise class, data augmentation, re-sampling, transfer learning, Bayesian optimisation), it automatically trains a neural network without requiring a large sample of labeled data and important computing resources. Our end-to-end methodology can handle noisy recordings made under different recording conditions. We test it on two different natural audio data sets, one from a group of Guinea baboons recorded from a primate research center and one from human babies recorded at home. The pipeline trains a model on 72 and 77 minutes of labeled audio recordings, with an accuracy of 94.58% and 99.76%. It is then used to process 443 and 174 hours of natural continuous recordings and it creates two new databases of 38.8 and 35.2 hours, respectively. We discuss the strengths and limitations of this approach that can be applied to any massive audio recording
Learning Higher‐Order Transitional Probabilities in Nonhuman Primates
International audienceThe extraction of cooccurrences between two events, A and B, is a central learning mechanism shared by all species capable of associative learning. Formally, the cooccurrence of events A and B appearing in a sequence is measured by the transitional probability (TP) between these events, and it corresponds to the probability of the second stimulus given the first (i.e., p(B|A)). In the present study, nonhuman primates (Guinea baboons, Papio papio) were exposed to a serial version of the XOR (i.e., exclusive-OR), in which they had to process sequences of three stimuli: A, B, and C. In this manipulation, first-order TPs (i.e., AB and BC) were uninformative due to their transitional probabilities being equal to .5 (i.e., p(B|A) = p(C|B) = .5), while secondorder TPs were fully predictive of the upcoming stimulus (i.e., p(C|AB) = 1). In Experiment 1, we found that baboons were able to learn second-order TPs, while no learning occurred on first-order TPs. In Experiment 2, this pattern of results was replicated, and a final test ruled out an alternative interpretation in terms of proximity to the reward. These results indicate that a non-human primate species can learn a nonlinearly separable problem such as the XOR. They also provide fine-grained empirical data to test models of statistical learning on the interaction between the learning of different orders of TPs. Recent bioinspired models of associative learning are also introduced as promising alternatives to the modeling of statistical learning mechanisms